Subspace Embeddings and \(\ell_p\)-Regression Using Exponential Random Variables

نویسندگان

  • David P. Woodruff
  • Qin Zhang
چکیده

Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real p, 1 ≤ p < ∞, given a matrix M ∈ R with n ≫ d, with constant probability we can choose a matrix Π with max(1, n)poly(d) rows and n columns so that simultaneously for all x ∈ R, ‖Mx‖p ≤ ‖ΠMx‖∞ ≤ poly(d)‖Mx‖p. Importantly, ΠM can be computed in the optimal O(nnz(M)) time, where nnz(M) is the number of non-zero entries of M . This generalizes all previous oblivious subspace embeddings which required p ∈ [1, 2] due to their use of p-stable random variables. Using our matrices Π, we also improve the best known distortion of oblivious subspace embeddings of l1 into l1 with Õ(d) target dimension in O(nnz(M)) time from Õ(d) to Õ(d), which can further be improved to Õ(d) log n if d = Ω(logn), answering a question of Meng and Mahoney (STOC, 2013). We apply our results to lp-regression, obtaining a (1 + ǫ)-approximation in O(nnz(M) logn) + poly(d/ǫ) time, improving the best known poly(d/ǫ) factors for every p ∈ [1,∞) \ {2}. If one is just interested in a poly(d) rather than a (1 + ǫ)-approximation to lp-regression, a corollary of our results is that for all p ∈ [1,∞) we can solve the lp-regression problem without using general convex programming, that is, since our subspace embeds into l∞ it suffices to solve a linear programming problem. Finally, we give the first protocols for the distributed lp-regression problem for every p ≥ 1 which are nearly optimal in communication and computation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subspace Embeddings and ℓp-Regression Using Exponential Random Variables

Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real p, 1 ≤ p <∞, given a matrix M ∈ Rn×d with n d, with constant probability we can choose a matrix Π with max(1, n1−2/p)poly(d) rows and n columns so that simultaneously for all x ∈ R, ‖Mx‖p ≤ ‖ΠMx‖∞ ≤ poly(d)‖Mx‖p. Importantly, ΠM can be computed in the optimal O(n...

متن کامل

Tight Bounds for $\ell_p$ Oblivious Subspace Embeddings

An lp oblivious subspace embedding is a distribution over r × n matrices Π such that for any fixed n× d matrix A, Pr Π [for all x, ‖Ax‖p ≤ ‖ΠAx‖p ≤ κ‖Ax‖p] ≥ 9/10, where r is the dimension of the embedding, κ is the distortion of the embedding, and for an n-dimensional vector y, ‖y‖p = ( ∑n i=1 |yi|) 1/p is the lp-norm. Another important property is the sparsity of Π, that is, the maximum numbe...

متن کامل

Subspace Embeddings for the Polynomial Kernel

Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms. However, its applicability has been limited to a certain extent since the crucial ingredient, the so-called oblivious subspace embedding, can only be applied to data spaces with an explicit representation as the column span or row span of a matrix, while in many settings learning is done in a...

متن کامل

Robust blind methods using $\ell_p$ quasi norms

It was shown in a previous work that some blind methods can be made robust to channel order overmodeling by using the l1 or lp quasi-norms. However, no theoretical argument has been provided to support this statement. In this work, we study the robustness of subspace blind based methods using l1 or lp quasi-norms. For the l1 norm, we provide the sufficient and necessary condition that the chann...

متن کامل

Experimental study for the comparison of classifier combination methods

In this paper, we compare the performances of classifier combination methods (bagging, modified random subspace method, classifier selection, parametric fusion) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are: (a) combination function among input variables, (b) correlation between input variables, (c) varianc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013